Skip to content

Conversation

@FazeelUsmani
Copy link
Contributor

This PR adds a new configuration option linkcheck_ignore_case to enable case-insensitive URL and anchor checking in the linkcheck builder.

Problem

Some web servers (e.g., GitHub, certain hosting platforms) are case-insensitive and may return URLs with different casing than the original link. This causes the linkcheck builder to report false-positive redirects when the URLs differ only in case, even though they point to the same resource.

Solution

  • Added linkcheck_ignore_case boolean configuration option (default: False)
  • Modified URL comparison logic to support case-insensitive matching when enabled
  • Modified anchor comparison in AnchorCheckParser to support case-insensitive matching when enabled
  • Added comprehensive tests for both URL and anchor case-insensitive checking
  • Updated documentation in doc/usage/configuration.rst

@FazeelUsmani FazeelUsmani marked this pull request as draft November 7, 2025 15:08
@jayaddison
Copy link
Contributor

Hi @FazeelUsmani - thank you for developing and describing this pull request.

I have a concern that enabling the option reduces the precision of other hyperlinks that are checked.

Could you explain the use case where it would be easier for a documentation project to enable this option by editing the conf.py instead of fixing the URLs/anchors in their documentation sources to use the correct casing?

@FazeelUsmani FazeelUsmani marked this pull request as ready for review November 10, 2025 08:56
@FazeelUsmani
Copy link
Contributor Author

That’s a good point, @jayaddison.
This option is off by default and meant mainly only for large or older docs where many links hit servers that normalise URL casing (like GitHub) or are case-insensitive (like Windows). Enabling it just filters out harmless casing-related redirects so teams can focus on real link issues instead of noise.

@jayaddison
Copy link
Contributor

@FazeelUsmani got it, understood. As often happens, I had a misunderstanding to begin with - you are saying that this only affects whether case-adjusted response URLs are considered to be redirect instead of successful.

Let me think about this a little more; I do understand the value in this now, but am wary of (and trying to think of) any problem side-effects.

@jayaddison
Copy link
Contributor

(also, thank you for the explanation)

@jayaddison
Copy link
Contributor

Separately: I do think that we should probably isolate the redirect-case-sensitivity handling from the HTML anchor case-sensitivity; they seem fairly functionally different from each other to me.

@FazeelUsmani
Copy link
Contributor Author

Hmm.. makes sense. I can refactor this into two separate options:
linkcheck_ignore_case_urls: For comparing URL paths (the redirect scenario)
linkcheck_ignore_case_anchors: For comparing HTML anchors

This would give users more granular control. Most users would likely want linkcheck_ignore_case_urls = True (for case-insensitive servers) while keeping linkcheck_ignore_case_anchors = False (since HTML IDs are technically case-sensitive per spec). What do you say?

@AA-Turner
Copy link
Member

AA-Turner commented Nov 10, 2025

Two options seems overkill for this use-case. What do browsers do de facto on case mismatches on fragment IDs?

A

@FazeelUsmani
Copy link
Contributor Author

Fair point — browsers generally treat fragment IDs as case-sensitive, though behavior can vary depending on the HTML generator. My thought was mainly to avoid false negatives in edge cases (like auto-generated anchors that normalize casing differently).
That said, I’m fine keeping it as a single option if we note the anchor behavior clearly in the docs.

@jayaddison
Copy link
Contributor

I can't think of drawbacks to the redirect case-folding -- and although it's maybe slightly controversial, I wonder whether we should enable it by default.

The anchor-checking I'm less certain about; given that we believe browsers seem to navigate to anchors case-sensitively -- something I too checked locally and that is certainly the case in Firefox 140.4 -- I'd be reluctant to offer that without a demonstrable use-case (again that can't be solved easily by fixing the source documentation).

@FazeelUsmani
Copy link
Contributor Author

That makes sense — I’ll keep it as a single linkcheck_ignore_case option limited to the URL path. Anchor checks will remain case-sensitive to align with browser behavior, and I’ll clarify this distinction in the docs so users understand the expected behavior.

@jayaddison
Copy link
Contributor

Sounds good to me! Thanks @FazeelUsmani.

@FazeelUsmani FazeelUsmani marked this pull request as draft November 11, 2025 13:13
@FazeelUsmani FazeelUsmani force-pushed the linkcheck_case_insensitive branch 3 times, most recently from 56d6a63 to d115b1e Compare November 11, 2025 14:31
Copy link
Contributor

@jayaddison jayaddison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature implementation looks good to me; thank you @FazeelUsmani!

@FazeelUsmani
Copy link
Contributor Author

Hi @jayaddison,
Thank you for approving my changes. How does the merging process work here?

@jayaddison
Copy link
Contributor

Hi @jayaddison, Thank you for approving my changes. How does the merging process work here?

You're welcome. I think the best response I can offer to answer that -- and I think you have followed most/all of the guidance there already -- is to refer to the Sphinx official contributing guide: https://www.sphinx-doc.org/en/master/internals/contributing.html#contribute-code

@FazeelUsmani
Copy link
Contributor Author

@jayaddison Done! I've implemented all three cleanup suggestions

@@ -0,0 +1,3 @@
`path1 <http://localhost:7777/path1>`_

`path2 <http://localhost:7777/path2>`_

This comment was marked as resolved.

@jayaddison
Copy link
Contributor

@jayaddison Done! I've implemented all three cleanup suggestions

That looks good to me - much neater, I think! Thanks again.

Copy link
Contributor

@jayaddison jayaddison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great - thank you, @FazeelUsmani!

@AA-Turner
Copy link
Member

Thanks @jayaddison @FazeelUsmani. I've pushed some tweaks to the implementation & renamed the config option to linkcheck_case_insensitive_urls.

A

@AA-Turner AA-Turner changed the title Add linkcheck_case_sensitive configuration option linkcheck: Allow case-insensitive URL comparisons Nov 24, 2025
@AA-Turner AA-Turner merged commit c06bfce into sphinx-doc:master Nov 24, 2025
30 checks passed
@AA-Turner AA-Turner added this to the 9.0.0 milestone Nov 25, 2025
@jayaddison
Copy link
Contributor

A note that the test_linkcheck_case_sensitivity test (with its 3x varying inputs) seems to account for ~14s of the test suite runtime; or, 3-of-the-5 slowest duration tests, when run in GitHub CI: https://github.com/sphinx-doc/sphinx/actions/runs/19749523166/job/56590035331?pr=14107#step:9:2482

That surprised me - I felt like those tests should run pretty quickly. Other linkcheck unit tests aren't exactly speedy either (~1s to ~2s is not atypical), but these ones seem noticeably longer duration.

@jayaddison
Copy link
Contributor

...hmm; maybe those longer durations are particularly present when running in GitHub CI on Windows. I'll try to determine whether that's the case, and more about it generally, soon.

@jayaddison
Copy link
Contributor

...hmm; maybe those longer durations are particularly present when running in GitHub CI on Windows. I'll try to determine whether that's the case, and more about it generally, soon.

My current best-guess is that these extended test durations are due to the socket.setdefaulttimeout(...) call that occurs in the test_connection_contention test case that appears above the case-sensitivity tests in the test_build_linkcheck.py file.

If so, that would mean that the cause of the long-duration tests is not really related to the logic in this pull request; it's simply a side-effect of the fact that these tests were added below the socket timeout config adjustment.

We could test this theory by relocating test_connection_contention to the end of the file -- if validated, a better fix would involve undoing/resetting the socket timeout during the test teardown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants